Earthquake

Aliénor Franck de Préaumont

WOULD IT BE POSSSIBLE TO FORECAST THE MAGNITUDE OF AN EARTHQUAKE?

INTRODUCTION

Earthquakes between 1965 and 2016 in the world

Biggest earthquakes (magnitude higher than 8 on the Richter scale) between 1965 and 2016.

As we see here on these plots, earthquakes are following the tectonic plates.The plate boundaries are made up of many fault and earthquakes occurs when there is a rapid motion along the tectonic boundaries. In the Earth’s crust, the faults are not moving for a long time, but due to tectonic energy the rocks on either side of a fault are sliding. The seismic waves are making the ground shaking, because underground rocks are suddenly breaking

S waves, P waves, Rayleigh and Love waves are causing the ground shaking. The P waves which are arriving first (15000 miles /per hour)are making the building vibrate. The S Waves are arriving afterwards and are damaging buildings. P and S Waves are high frequency waves; the low frequencies waves (Rayleigh and Love) are arriving last.

Would it be possible to predict the magnitude of an earthquake in order to construct buildings with resistant design?

This data set found on kaggle: https://www.kaggle.com/usgs/earthquake-database/code can help us to find out, if we can answer this question.

We will firstly see data visualisation and try modelling with ensembles and neural networks.

The different features are:

Date : refers to the date when the earthquake occured, we are here analysing the timeline from 1965 to 2016, Date is recorded by a seismometer.

Latitude and Longitude features refers to the geospatial coordinates of the earthquake.

Type: is defining the type of shaking we are analysing. It can be: earthquake, nuclear explosion, explosion, rockburst.

A rockburst is a violent expulsion of rock from the walls of a mine opening caused by heavy pressure on brittle rocks in deep mines where mining has deprived the rock of support on one side (https://www.merriam-webster.com/dictionary/rock%20burst) nuclear explosion: result of the rapid release of energy from a high-speed nuclear reaction (https://en.wikipedia.org/wiki/Nuclear_explosion)

Depth feature: there is an earthquake depth range of 0 to 700 km : between the Earth’s surface and about 700 kilometers below the surface.

depth error: Since most earthquakes are deep within the crust, an error of +/- 1 or 2 km is irrelevant; in other words, it is a small error when the depth is something like 13 km. If the earthquake depth is relatively shallow, however, it becomes more of an issue. A negative depth can sometimes be an artifact of the poor resolution for a shallow event. (https://www.usgs.gov/faqs/what-does-it-mean-earthquake-occurred-a-depth-0-km-how-can-earthquake-have-a-negative-depth?qt-news_science_products=0#qt-news_science_products)

depth seismic stations: reflects the deepness of the seismic stations. Seismic Stations are installed underground, like the scheme is showing below:

A caption

A caption

Magnitude will be our output variable, Magnitude is recorded by seismometer and refers to the seismic energy released by an earthquake, It’s a measure of the amplitude of the shaking waves genereated by an earthquake.

Magnitude Type feature:

Since 2005 the International Association of Seismology and Physics of the Earth’s Interior (IASPEI) has standardized the measurement procedures and equations for the principal magnitude scales, ML, Ms, mb, mB and mbLg

According to: http://www.earthquakes.bgs.ac.uk/education/faqs/faq15.html Surface wave magnitude (Ms) is based on the maximum amplitude of the surface wave having a period of 20 + 2 s. It is used for observations near the earthquake epicenter where the surface wave is larger than the body wave. This scale applies to any epicentral distance or type of seismograph.

Body wave magnitude (mb) is calculated from the body waves (P,PP,S) and are usually used at larger distance from the earthquake epicentre (P-wave attenuation is less than surface waves, with distance). It can be used for any earthquake of any depth.

Moment magnitude (Mw) is considered the best scale to use for larger earthquakes as the Ms saturates at about magnitude 8. Moment magnitude is measured over the broad range of frequencies present in the earthquake wave spectrum rather than the single frequency sample that the other magnitude scales use.

For comparison purposes, a magnitude 5 ML earthquake is equivalent to the explosion of 1,000 tons of TNT whereas a magnitude 6 ML earthquake is the energy equivalent of 30,000 tons of TNT or a 30 kilotonne nuclear explosion.

Magnitude Error: is the difference between the exact value and the measurement approximation. Magnitude Seismic Stations: magnitude at the seismic station

azimuthal gap:

A caption

A caption

horizontal distance: is the distance from the epicenter to the nearest station (in km). horizontal error: is the difference between the exact value and the measurement approximation For instance, 1 m errors signify the best relative location with respect to similar events, 2 m is the next level of location quality, etc.

root mean square: For seismic integration, RMS is a most commonly used post stack amplitude attribute, it computes the square root of the sum of squared amplitude values divided by the number of samples within the specified window.

ID, Source, Source, Location Source, Magnitude Source, Status refers to Earthquake reference, the organization which recorded the earthquake, where they recorded it, which organization recorded the magnitude and how they did it. We won’t analyse these features.

DATA PREPROCESSING, NORMALITY AND CORRELATION CHECK

Analysis Y output - Magnitude

mean_magn median_magn min_magn max_magn
5.882531 5.7 5.5 9.1

We can see here the distribution curve of the ourput feature Magnitude. Magnitude is not normally distributed, we don’t recongnize the well known bell shape curve. The mean is 5.88 the median is 5.7, the minimal value is 5.5 and the maximal one is 9.1. The distribution curve is symetric.

transform to factor and date

DENSITY FUNCTIONS

Now let’s check density function of each variable.

None of the variables has a normally distributed functions. Magnitude Error and Root Mean Square Error are closed to a normal shape curve. It will be difficult to correct the other variables in order to train models which are sensitive to the normal distribution condition.

We could think about log transforming. For modeling, we can maybe use Ensembles, which are not sensitive to normal distribution condition.

Transform to factor and date

We need to transform Date in a Date format and transform Type and Magnitude Type as factor.

transform NA We need to check if there are NA values. The Naniar library can be usefull and shows us a plot with NA values for each variable.

Here we see, that for Depth Error, Depth Seismic Stations, Magnitude Error, Magnitude Seismic Stations, Azimuthal Gap, Horizontal Distance, Horizontal Error, Root Mean Square between 68% and 98% the datas are missing… we can replace the missing data with the mean of each variable.

We remove time, ID, Source, Location Source, Magnitude Source, Status which seem to be irrelevant variables.

## 'data.frame':    23412 obs. of  15 variables:
##  $ Date                      : Date, format: "1965-01-02" "1965-01-04" ...
##  $ Latitude                  : num  19.25 1.86 -20.58 -59.08 11.94 ...
##  $ Longitude                 : num  145.6 127.4 -174 -23.6 126.4 ...
##  $ Type                      : Factor w/ 4 levels "Earthquake","Explosion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Depth                     : num  132 80 20 15 15 ...
##  $ Depth.Error               : num  4.99 4.99 4.99 4.99 4.99 ...
##  $ Depth.Seismic.Stations    : num  275 275 275 275 275 ...
##  $ Magnitude                 : num  6 5.8 6.2 5.8 5.8 6.7 5.9 6 6 5.8 ...
##  $ Magnitude.Type            : Factor w/ 11 levels "","MB","MD","MH",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ Magnitude.Error           : num  0.0718 0.0718 0.0718 0.0718 0.0718 ...
##  $ Magnitude.Seismic.Stations: num  48.9 48.9 48.9 48.9 48.9 ...
##  $ Azimuthal.Gap             : num  44.2 44.2 44.2 44.2 44.2 ...
##  $ Horizontal.Distance       : num  3.99 3.99 3.99 3.99 3.99 ...
##  $ Horizontal.Error          : num  7.66 7.66 7.66 7.66 7.66 ...
##  $ Root.Mean.Square          : num  1.02 1.02 1.02 1.02 1.02 ...

CORRELATION

We notice that the variables are not correlated with each other, we don’t need to correct this for further analysis. Magnitude Error and horizontal error have a positive correlation with 0.5…

##FURTHER DATA VISUALTIONS

Explosion, Nuclear explosion, Rock Burst are representing a small amount of the data set. We notice only a few points for each of them… the main part is made of the earthquake category.

We see no particular correlation between Depth and Magnitude: the magnitude really high for a small depth ( between 0 ans 100)

The earthquake which has the highest magnitude level, has only a depth of around 40… The earthquake which has the the deepest level (around 700 m), has only a magnitude of 5.5. it seems that these earthquake are very deep and the Magnitude remains low.

We notice a few earthquake with a Depth of 300kms

Nuclear Explosion produces magnitude levels between 5.5 and 7 but no depth ( which make sens).

Here we see the different types of magnitude, placed on the plot depending on their depth and on their magnitude.

There are far more earthquakes from the MW type family…

MS refers to Magitude Surface Earthquake, earthquakes occurs in the highest surface of the earth’s crust.

ML is the local Magnitude , it refers to the Richter magnitude Scale. We see here that the observations are concentrate on the magnitude and not the depth.

MB refers to the Body Magnitude Scale. Body Waves are made up P waves (first to arrive) and S waves ( Second to arrive). MB only uses P waves.

MW is the Moment Magnitude, it’s based on an earthquake seismic moment, it measures how much an earthquake slides, it’s a measure of the fault slip . This type of earthquake is more related to the energy involved in an Earthquake. It’s not classified with Magnitude and Depth.

This MW type is far more used nowadays to measure an earthquake, that’s why there are much more observations with MW… For example the US Geological Survey uses it to report large earthquakes. MWB, MWC, MWR and MWW are subcategories of Moment Magnitude.

If the seismic station is far away from the epicenter then the Depth and Magnitude are not so high. The earthquake ’s measurements are probably more limited as the seismic station is far away form the epicenter.

The azimuthal gap and the magnitude and depth are negatively correlated. We need an azimuthal gap not higher than 150°, or we get a depth and magnitude closed to 0.

Time series

EARTHQUAKE TYPE MOVING

We are seeing here the different types of earthquakes which appear successively. Type 1 refers to the earthquake category, which constitute the main part.

The Type 2 is the explosion one and Type 3 the nuclear epxlosions one. The magnitude of an explosion and nuclear explosion is not so high in comparison to earthquakes, all of them happened before 1995.

We notice as well that the Magnitude type as evolved ( MW, MS, ML),we can see clear differences between each colors. It can be explained by the fact that for example Mw Type has been developed in 1977 and 1979 (Kanamori (1977) and Hanks & Kanamori (1979)). That’s why we can’t recognize before MW magnitude type.

MAGNITUDE TYPE MOVING

We clearly see in green the nuclear explosion, the rest of the the observations are in black. We recognize here every distribution curve from these plot. MB, MW, MWB, MWC, MWW magnitude type have the denser distribution curve. It can be explained by the fact, that MW magnitude type is now a standard in classifying earthquakes.

DATE MOVING

We recognize here Magnitude Type in Colors, most of the observations are implying a magnitude between 5 and 7 on the richter scale.

We notice that the strongest earthquakes happened in the last years ( after 2000).

Here we can see an extra plot done with plotly, we see the characteristics of the largest earthquakes() magnitude higher than 8 on the Richter scale) we notice that the biggest earthquakes happened after 1990. Before 1990, the earthquake weren’t so deep and not so strong in terms of magnitude. Before 1990, the depth was between 20 and 40, and after 1990 2 very deep earthquakes happened: 631,3 and 598,1.

After 1990, the magnitude has increased significantly: in the seventies the magnitude was around at 8.2 ( max 8.7), and in 2010, the magnitude reached twice 9.1…

ANOVA

"What Does ANOVA Do? It compares the means (of an interval-ratio dependent variable) for the groups defined by the categories of an independent variable measured at the nominal or ordinal level.

The null hypothesis is that all of the group means are equal to each other — that there are no differences. μ 1 = μ 2 = μ 3 = μ 4 etc. for however many groups there are. Rejecting the null hypothesis implies that, in at least one pair of group means, the means were different from each other.

Remember: Variances are measures of deviations from means. We are going to use variances to compare means.

Is there more variability between the groups than variability within the groups? We will compute the ratio of between-groups variance to within-groups variance."

##                Df Sum Sq Mean Sq F value Pr(>F)
## Type            3      0 0.09458   0.528  0.663
## Residuals   23408   4190 0.17900

As the p-value is higher than the significance level 0.05, we can conclude that there is no significant difference between the groups highlighted with “*" in the model summary.

##                   Df Sum Sq Mean Sq F value Pr(>F)    
## Magnitude.Type    10    232  23.210   137.2 <2e-16 ***
## Residuals      23401   3958   0.169                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As the p-value is less than the significance level 0.05, we can conclude that there are significant differences between the groups highlighted with “*" in the model summary.

We need to use a Tukey test, in order to check between which groups the differences are.

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Magnitude ~ Magnitude.Type, data = EQ3)
## 
## $Magnitude.Type
##                diff         lwr          upr     p adj
## MB-     -0.02371001 -0.78834894  0.740928929 1.0000000
## MD-      0.26000000 -0.67611434  1.196114339 0.9983571
## MH-      0.83333333 -0.13348140  1.800148065 0.1678547
## ML-      0.10800866 -0.67107285  0.887090171 0.9999972
## MS-      0.28769291 -0.47731457  1.052700390 0.9814996
## MW-      0.22712769 -0.53735493  0.991610302 0.9971057
## MWB-     0.20061568 -0.56418477  0.965416128 0.9989942
## MWC-     0.15150938 -0.61302699  0.916045749 0.9999185
## MWR-    -0.07589744 -0.88312427  0.731329403 0.9999999
## MWW-     0.30200706 -0.46290504  1.066919163 0.9736558
## MD-MB    0.28371001 -0.25718680  0.824606808 0.8419148
## MH-MB    0.85704334  0.26459923  1.449487447 0.0001698
## ML-MB    0.13171866 -0.02068638  0.284123712 0.1648674
## MS-MB    0.31140292  0.27272812  0.350077714 0.0000000
## MW-MB    0.25083769  0.22451350  0.277161883 0.0000000
## MWB-MB   0.22432568  0.18998874  0.258662621 0.0000000
## MWC-MB   0.17521938  0.14737774  0.203061027 0.0000000
## MWR-MB  -0.05218743 -0.31271471  0.208339851 0.9999100
## MWW-MB   0.32571707  0.28897716  0.362456976 0.0000000
## MH-MD    0.57333333 -0.22830709  1.374973760 0.4322430
## ML-MD   -0.15199134 -0.71311936  0.409136675 0.9986737
## MS-MD    0.02769291 -0.51372476  0.569110580 1.0000000
## MW-MD   -0.03287231 -0.57354811  0.507803483 1.0000000
## MWB-MD  -0.05938432 -0.60050943  0.481740781 0.9999997
## MWC-MD  -0.10849062 -0.64924242  0.432261179 0.9999087
## MWR-MD  -0.33589744 -0.93549048  0.263695607 0.7785584
## MWW-MD   0.04200706 -0.49927584  0.583289957 1.0000000
## ML-MH   -0.72532468 -1.33629548 -0.114353872 0.0062547
## MS-MH   -0.54564042 -1.13856012  0.047279271 0.1043143
## MW-MH   -0.60620565 -1.19844798 -0.013963309 0.0395130
## MWB-MH  -0.63271766 -1.22537021 -0.040065103 0.0249288
## MWC-MH  -0.68182395 -1.27413568 -0.089512229 0.0096775
## MWR-MH  -0.90923077 -1.55570775 -0.262753788 0.0003125
## MWW-MH  -0.53132627 -1.12412290  0.061470357 0.1276564
## MS-ML    0.17968425  0.02544080  0.333927702 0.0082027
## MW-ML    0.11911903 -0.03249978  0.270737843 0.2874734
## MWB-ML   0.09260702 -0.06060632  0.245820357 0.6868418
## MWC-ML   0.04350072 -0.10838891  0.195390347 0.9978701
## MWR-ML  -0.18390609 -0.48418890  0.116376716 0.6690539
## MWW-ML   0.19399840  0.04022870  0.347768109 0.0024118
## MW-MS   -0.06056522 -0.09601527 -0.025115173 0.0000018
## MWB-MS  -0.08707723 -0.12882371 -0.045330755 0.0000000
## MWC-MS  -0.13618353 -0.17277452 -0.099592547 0.0000000
## MWR-MS  -0.36359035 -0.62519732 -0.101983375 0.0004000
## MWW-MS   0.01431415 -0.02943015  0.058058453 0.9936478
## MWB-MW  -0.02651201 -0.05717129  0.004147269 0.1642780
## MWC-MW  -0.07561831 -0.09877264 -0.052463976 0.0000000
## MWR-MW  -0.30302512 -0.56309325 -0.042956997 0.0081793
## MWW-MW   0.07487937  0.04155090  0.108207845 0.0000000
## MWC-MWB -0.04910630 -0.08107793 -0.017134662 0.0000409
## MWR-MWB -0.27651311 -0.53751406 -0.015512169 0.0272086
## MWW-MWB  0.10139138  0.06143078  0.141351982 0.0000000
## MWR-MWC -0.22740681 -0.48763292  0.032819288 0.1529500
## MWW-MWC  0.15049768  0.11595812  0.185037241 0.0000000
## MWW-MWR  0.37790450  0.11657656  0.639232428 0.0001712

diff represents the difference between means of the two groups lwr, upr: the lower and the upper end point of the confidence interval at 95% (default) p adj: p-value after adjustment for the multiple comparisons.

For example, betweeen MS (Magnitude Surface) and MW (Magnitude Moment) the p value 0.0000018, there is a significant difference between the 2 groups.

MODELLING WITH THE h2o library

ENSEMBLES

We are using ensembles and trees to predict the Magnitude because they are not sensitive to the normality condition, we have more chance to get significant results here. We are using the h2o package in order to compare it to deep learning results.

It is based on generating a large number of decision trees, each constructed using a different subset of your training set. These subsets are usually selected by sampling at random and with replacement from the original data set.

library(h2o)
localH2O = h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)
## 
## H2O is not running yet, starting it now...
## 
## Note:  In case of errors look at the following log files:
##     C:\Users\alien\AppData\Local\Temp\RtmpYpXkaI\file94842411c65/h2o_alien_started_from_r.out
##     C:\Users\alien\AppData\Local\Temp\RtmpYpXkaI\file94813ef49a8/h2o_alien_started_from_r.err
## 
## 
## Starting H2O JVM and connecting:  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         2 seconds 124 milliseconds 
##     H2O cluster timezone:       Europe/Berlin 
##     H2O data parsing timezone:  UTC 
##     H2O cluster version:        3.32.1.3 
##     H2O cluster version age:    3 months and 13 days !!! 
##     H2O cluster name:           H2O_started_from_R_alien_lcl300 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   3.54 GB 
##     H2O cluster total cores:    12 
##     H2O cluster allowed cores:  12 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     H2O API Extensions:         Amazon S3, Algos, AutoML, Core V3, TargetEncoder, Core V4 
##     R Version:                  R version 4.0.5 (2021-03-31)
## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is too old (3 months and 13 days)!
## Please download and install the latest version from http://h2o.ai/download/
eartQ3.hex <- as.h2o(EQ3)
## Warning in use.package("data.table"): data.table cannot be used without R
## package bit64 version 0.9.7 or higher. Please upgrade to take advangage of
## data.table speedups.
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%

see features of the dataset

h2o.describe(eartQ3.hex)
##                         Label Type Missing Zeros PosInf NegInf          Min
## 1                        Date time       3     1      0      0 -1.57680e+11
## 2                    Latitude real       0     1      0      0 -7.70800e+01
## 3                   Longitude real       0     0      0      0 -1.79997e+02
## 4                        Type enum       0 23232      0      0  0.00000e+00
## 5                       Depth real       0   170      0      0 -1.10000e+00
## 6                 Depth.Error real       0    44      0      0  0.00000e+00
## 7      Depth.Seismic.Stations real       0    29      0      0  0.00000e+00
## 8                   Magnitude real       0     0      0      0  5.50000e+00
## 9              Magnitude.Type enum       0     3      0      0  0.00000e+00
## 10            Magnitude.Error real       0     1      0      0  0.00000e+00
## 11 Magnitude.Seismic.Stations real       0    47      0      0  0.00000e+00
## 12              Azimuthal.Gap real       0     2      0      0  0.00000e+00
## 13        Horizontal.Distance real       0     0      0      0  4.50500e-03
## 14           Horizontal.Error real       0     0      0      0  8.50000e-02
## 15           Root.Mean.Square real       0     2      0      0  0.00000e+00
##             Max         Mean        Sigma Cardinality
## 1  1.483056e+12 7.299950e+11 4.552885e+11          NA
## 2  8.600500e+01 1.679033e+00 3.011318e+01          NA
## 3  1.799980e+02 3.963996e+01 1.255120e+02          NA
## 4  3.000000e+00           NA           NA           4
## 5  7.000000e+02 7.076791e+01 1.226519e+02          NA
## 6  9.129500e+01 4.993115e+00 2.127886e+00          NA
## 7  9.340000e+02 2.753641e+02 8.926709e+01          NA
## 8  9.100000e+00 5.882531e+00 4.230656e-01          NA
## 9  1.000000e+01           NA           NA          11
## 10 4.100000e-01 7.181957e-02 6.073216e-03          NA
## 11 8.210000e+02 4.894462e+01 2.082632e+01          NA
## 12 3.600000e+02 4.416353e+01 1.794560e+01          NA
## 13 3.787400e+01 3.992660e+00 1.407077e+00          NA
## 14 9.900000e+01 7.662759e+00 2.316764e+00          NA
## 15 3.440000e+00 1.022784e+00 1.623186e-01          NA

We just gave the dataset to the h2o package and we see here all of our variables.

split data 80% for training test // 20% for test set

splits <- h2o.splitFrame(data = eartQ3.hex,
                         ratios = c(0.8),  #partition data into 80% and 20% chunks
                         seed = 1234)
train <- splits[[1]]
test <- splits[[2]]

We split our dataset for training and then test our model.

create our random forest model

rf <- h2o.randomForest(x = c("Date", "Latitude", "Longitude", "Type", "Depth", "Depth.Error", "Depth.Seismic.Stations", "Magnitude.Type", "Magnitude.Error", "Magnitude.Seismic.Stations", "Azimuthal.Gap", "Horizontal.Distance", "Horizontal.Error", "Root.Mean.Square"),
                    y = c("Magnitude"),
                    training_frame = train,
                    model_id = "our.rf",
                    seed = 1234)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |======================================================================| 100%

more details about random forest model

print(rf)
## Model Details:
## ==============
## 
## H2ORegressionModel: drf
## Model ID:  our.rf 
## Model Summary: 
##   number_of_trees number_of_internal_trees model_size_in_bytes min_depth
## 1              50                       50             3054313        20
##   max_depth mean_depth min_leaves max_leaves mean_leaves
## 1        20   20.00000       3470       6373  4860.74000
## 
## 
## H2ORegressionMetrics: drf
## ** Reported on training data. **
## ** Metrics reported on Out-Of-Bag training samples **
## 
## MSE:  0.138575
## RMSE:  0.3722566
## MAE:  0.2650159
## RMSLE:  0.05135277
## Mean Residual Deviance :  0.138575

The h2o random forest model is automatically setting up a regression model.

The number of trees to be generated is here 50 ( original setup),the number of internal trees is the number of features used in the construction of each tree.

The depth of the trees is 20, the maximal and the minimal depth is set up with 20

There are between around 3000 and 7000 leaves.

plot(rf)

We see here the number of trees needed to get the best rmse. After 10 trees, the RMSE is not decreasing so much. But we notice that each tree is relatively complex: they have more than 3000 leaves. Regarding Parcimony and explainabilitty the random forest perfoms well.

model performing the test set

rf_perf1 <- h2o.performance(model = rf, newdata = test)
print(rf_perf1)
## H2ORegressionMetrics: drf
## 
## MSE:  0.1288205
## RMSE:  0.3589157
## MAE:  0.2622424
## RMSLE:  0.04983777
## Mean Residual Deviance :  0.1288205

The performance of our model is improving on the test set, which seems to be a good news. It means that the model is working well, without overfitting, or other problem…

We check the MSE and the RMSE, because they are robust metrics to outliers (for example, we noted earlier that there were 2 really strong earthquakes after 2010). It’s not bad if our metrics are not sensitive to outliers.

prediction on the test set

predictions <- h2o.predict(rf, test)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
print(predictions)
##    predict
## 1 6.033752
## 2 5.950374
## 3 6.193097
## 4 5.871654
## 5 5.980598
## 6 5.926996
## 
## [4593 rows x 1 column]

According this prediction the Magnitude will be between 5.5 and 6.

Let’s try to compare the tree model with the deep learning one. H2o allows us to train neural network in order to do a regression prediction.

DEEP LEARNING MODEL

We are using here Tree model and neural network, because the data has a non linear relationship. These 2 process are adapted to our dataset. Neural networks allow us predictions with highly interconnected simulated “neurons”.

Neural networks (also called “multilayer perceptron”) provide models of data relationships through highly interconnected, simulated “neurons” that accept inputs, apply weighting coefficients and feed their output to other “neurons” which continue the process through the network to the eventual output. Some neurons may send feedback to earlier neurons in the network. Neural networks are “trained” to deliver the desired result by an iterative (and often lengthy) process where the weights applied to each input at each neuron are adjusted to optimize the desired output.

Neural networks are often compared to decision trees because both methods can model data that have nonlinear relationships between variables, and both can handle interactions between variables.

We are spliting here our dataset in a training part ( 60% of the data), a validation part (20%) and a test set (20%)

splits <- h2o.splitFrame(eartQ3.hex, c(0.6,0.2), seed=1234)
train_split  <- h2o.assign(splits[[1]], "train.hex")
valid_split  <- h2o.assign(splits[[2]], "valid.hex") 
test_split   <- h2o.assign(splits[[3]], "test.hex") 
str(train_split)
## Class 'H2OFrame' <environment: 0x000000003e2085b8> 
##  - attr(*, "op")= chr "assign"
##  - attr(*, "id")= chr "train.hex"
##  - attr(*, "nrow")= int 14078
##  - attr(*, "ncol")= int 15
##  - attr(*, "types")=List of 15
##   ..$ : chr "time"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "enum"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "enum"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##  - attr(*, "data")='data.frame': 10 obs. of  15 variables:
##   ..$ Date                      : num  -1.58e+11 -1.57e+11 -1.57e+11 -1.57e+11 -1.57e+11 ...
##   ..$ Latitude                  : num  1.86 11.94 -13.4 27.36 -13.31 ...
##   ..$ Longitude                 : num  127.4 126.4 166.6 87.9 166.2 ...
##   ..$ Type                      : Factor w/ 4 levels "Earthquake","Explosion",..: 1 1 1 1 1 1 1 1 1 1
##   ..$ Depth                     : num  80 15 35 20 35 ...
##   ..$ Depth.Error               : num  4.99 4.99 4.99 4.99 4.99 ...
##   ..$ Depth.Seismic.Stations    : num  275 275 275 275 275 ...
##   ..$ Magnitude                 : num  5.8 5.8 6.7 5.9 6 6 5.9 8.2 5.5 5.6
##   ..$ Magnitude.Type            : Factor w/ 11 levels "","MB","MD","MH",..: 7 7 7 7 7 7 7 7 7 7
##   ..$ Magnitude.Error           : num  0.0718 0.0718 0.0718 0.0718 0.0718 ...
##   ..$ Magnitude.Seismic.Stations: num  48.9 48.9 48.9 48.9 48.9 ...
##   ..$ Azimuthal.Gap             : num  44.2 44.2 44.2 44.2 44.2 ...
##   ..$ Horizontal.Distance       : num  3.99 3.99 3.99 3.99 3.99 ...
##   ..$ Horizontal.Error          : num  7.66 7.66 7.66 7.66 7.66 ...
##   ..$ Root.Mean.Square          : num  1.02 1.02 1.02 1.02 1.02 ...
str(valid_split)
## Class 'H2OFrame' <environment: 0x0000000057d9a9a8> 
##  - attr(*, "op")= chr "assign"
##  - attr(*, "id")= chr "valid.hex"
##  - attr(*, "nrow")= int 4741
##  - attr(*, "ncol")= int 15
##  - attr(*, "types")=List of 15
##   ..$ : chr "time"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "enum"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "enum"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##  - attr(*, "data")='data.frame': 10 obs. of  15 variables:
##   ..$ Date                      : num  -1.58e+11 -1.57e+11 -1.55e+11 -1.55e+11 -1.55e+11 ...
##   ..$ Latitude                  : num  19.2 -20.6 51.3 51.8 51.8 ...
##   ..$ Longitude                 : num  146 -174 179 173 174 ...
##   ..$ Type                      : Factor w/ 4 levels "Earthquake","Explosion",..: 1 1 1 1 1 1 1 1 1 1
##   ..$ Depth                     : num  131.6 20 30.3 10 31.8 ...
##   ..$ Depth.Error               : num  4.99 4.99 4.99 4.99 4.99 ...
##   ..$ Depth.Seismic.Stations    : num  275 275 275 275 275 ...
##   ..$ Magnitude                 : num  6 6.2 8.7 5.7 5.7 5.8 6 6.4 5.7 5.8
##   ..$ Magnitude.Type            : Factor w/ 11 levels "","MB","MD","MH",..: 7 7 7 7 7 7 7 7 7 7
##   ..$ Magnitude.Error           : num  0.0718 0.0718 0.0718 0.0718 0.0718 ...
##   ..$ Magnitude.Seismic.Stations: num  48.9 48.9 48.9 48.9 48.9 ...
##   ..$ Azimuthal.Gap             : num  44.2 44.2 44.2 44.2 44.2 ...
##   ..$ Horizontal.Distance       : num  3.99 3.99 3.99 3.99 3.99 ...
##   ..$ Horizontal.Error          : num  7.66 7.66 7.66 7.66 7.66 ...
##   ..$ Root.Mean.Square          : num  1.02 1.02 1.02 1.02 1.02 ...
str(test_split)
## Class 'H2OFrame' <environment: 0x00000000bae6d770> 
##  - attr(*, "op")= chr "assign"
##  - attr(*, "id")= chr "test.hex"
##  - attr(*, "nrow")= int 4593
##  - attr(*, "ncol")= int 15
##  - attr(*, "types")=List of 15
##   ..$ : chr "time"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "enum"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "enum"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##   ..$ : chr "real"
##  - attr(*, "data")='data.frame': 10 obs. of  15 variables:
##   ..$ Date                      : num  -1.57e+11 -1.56e+11 -1.55e+11 -1.55e+11 -1.55e+11 ...
##   ..$ Latitude                  : num  -59.1 -24.6 37.5 51.6 52.5 ...
##   ..$ Longitude                 : num  -23.6 178.5 73.3 175.1 172 ...
##   ..$ Type                      : Factor w/ 4 levels "Earthquake","Explosion",..: 1 1 1 1 1 1 1 1 1 1
##   ..$ Depth                     : num  15 565 15 30 25 25 25 20 30 25
##   ..$ Depth.Error               : num  4.99 4.99 4.99 4.99 4.99 ...
##   ..$ Depth.Seismic.Stations    : num  275 275 275 275 275 ...
##   ..$ Magnitude                 : num  5.8 5.8 6 6 5.7 5.8 5.9 5.9 7.3 6.4
##   ..$ Magnitude.Type            : Factor w/ 11 levels "","MB","MD","MH",..: 7 7 7 7 7 7 7 7 7 7
##   ..$ Magnitude.Error           : num  0.0718 0.0718 0.0718 0.0718 0.0718 ...
##   ..$ Magnitude.Seismic.Stations: num  48.9 48.9 48.9 48.9 48.9 ...
##   ..$ Azimuthal.Gap             : num  44.2 44.2 44.2 44.2 44.2 ...
##   ..$ Horizontal.Distance       : num  3.99 3.99 3.99 3.99 3.99 ...
##   ..$ Horizontal.Error          : num  7.66 7.66 7.66 7.66 7.66 ...
##   ..$ Root.Mean.Square          : num  1.02 1.02 1.02 1.02 1.02 ...

we set Magnitude as response variable and the rest of the features as predictors.

response <- 'Magnitude'
predictors <- setdiff(names(train_split), response)

deep learning: Deep Learning is a branch of Machine Learning algorithms that are inspired by the functioning of the human brain and the nervous system. The input data is clamped to the input layer of the deep network. The network then uses a hierarchy of hidden layers that successively produce compact, higher level abstractions (representations) of the input data. It thus produces pseudo-features of the input features. Neural networks with 3 layers or more are considered ‘Deep’.

We train a first deep learning model with 5 epochs (it will work faster) and 2 hidden layers with 10 neurons each.

model_dl <- h2o.deeplearning(x = predictors,
                          y = response,
                          training_frame = train_split,
                          validation_frame = test_split,
                          hidden = c(10, 10),
                          epochs = 5,
                          seed = 1234)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |======================================================================| 100%
summary(model_dl)
## Model Details:
## ==============
## 
## H2ORegressionModel: deeplearning
## Model Key:  DeepLearning_model_R_1630520884914_1 
## Status of Neuron Layers: predicting Magnitude, regression, gaussian distribution, Quadratic loss, 421 weights/biases, 11,3 KB, 70.391 training samples, mini-batch size 1
##   layer units      type dropout       l1       l2 mean_rate rate_rms momentum
## 1     1    29     Input  0.00 %       NA       NA        NA       NA       NA
## 2     2    10 Rectifier  0.00 % 0.000000 0.000000  0.102458 0.290496 0.000000
## 3     3    10 Rectifier  0.00 % 0.000000 0.000000  0.001516 0.001481 0.000000
## 4     4     1    Linear      NA 0.000000 0.000000  0.000256 0.000083 0.000000
##   mean_weight weight_rms mean_bias bias_rms
## 1          NA         NA        NA       NA
## 2    0.004546   0.227564  0.509173 0.122447
## 3   -0.066283   0.275597  0.982221 0.120977
## 4   -0.003486   0.397925  0.225834 0.000000
## 
## H2ORegressionMetrics: deeplearning
## ** Reported on training data. **
## ** Metrics reported on temporary training frame with 10053 samples **
## 
## MSE:  0.1499645
## RMSE:  0.3872525
## MAE:  0.2743254
## RMSLE:  0.05330261
## Mean Residual Deviance :  0.1499645
## 
## 
## H2ORegressionMetrics: deeplearning
## ** Reported on validation data. **
## ** Metrics reported on full validation frame **
## 
## MSE:  0.1389511
## RMSE:  0.3727615
## MAE:  0.2698239
## RMSLE:  0.05167966
## Mean Residual Deviance :  0.1389511
## 
## 
## 
## 
## Scoring History: 
##             timestamp   duration training_speed  epochs iterations      samples
## 1 2021-09-01 20:28:18  0.000 sec             NA 0.00000          0     0.000000
## 2 2021-09-01 20:28:21  3.461 sec  77532 obs/sec 0.50668          1  7133.000000
## 3 2021-09-01 20:28:21  3.668 sec 272833 obs/sec 5.00007         10 70391.000000
##   training_rmse training_deviance training_mae training_r2 validation_rmse
## 1            NA                NA           NA          NA              NA
## 2       0.40281           0.16226      0.29956     0.11524         0.39083
## 3       0.38725           0.14996      0.27433     0.18227         0.37276
##   validation_deviance validation_mae validation_r2
## 1                  NA             NA            NA
## 2             0.15275        0.29660       0.08962
## 3             0.13895        0.26982       0.17184
## 
## Variable Importances: (Extract with `h2o.varimp`) 
## =================================================
## 
## Variable Importances: 
##              variable relative_importance scaled_importance percentage
## 1   Magnitude.Type.MB            1.000000          1.000000   0.063852
## 2  Magnitude.Type.MWB            0.700275          0.700275   0.044714
## 3   Magnitude.Type.ML            0.686198          0.686198   0.043815
## 4   Magnitude.Type.MW            0.678786          0.678786   0.043342
## 5 Horizontal.Distance            0.665880          0.665880   0.042518
## 
## ---
##                      variable relative_importance scaled_importance percentage
## 24                Depth.Error            0.450466          0.450466   0.028763
## 25              Azimuthal.Gap            0.440896          0.440896   0.028152
## 26                   Latitude            0.393176          0.393176   0.025105
## 27            Type.Rock Burst            0.342017          0.342017   0.021839
## 28 Magnitude.Type.missing(NA)            0.000000          0.000000   0.000000
## 29           Type.missing(NA)            0.000000          0.000000   0.000000
h2o.mse(model_dl)
## [1] 0.1499645
h2o.rmse(model_dl)
## [1] 0.3872525

Layer 2 and 3 refers to the 2 hidden layer with 10 neurons each. we use here a rectifier activation function: In the context of artificial neural networks, the rectifier or ReLU (Rectified Linear Unit) activation function is an activation function defined as the positive part of its argument. It is often a standard function uses for neural networks, used for its simplicity.

We are training here a simple model, without drop out and without regularization.

Mean rate tells us how the learning rate controls the learning problem of the model (quickly or slowly).

metrics

Let’s check the variable importances:

Variable impotance is difficult to compute for neural networks, H2o Deep learning is implementing the method of Gedeon and returns relative variables importances in descending order of importance.

The Gedeon method is considering, the weights connecting the input features to the first two hidden layers (for simplicity and speed).

head(as.data.frame(h2o.varimp(model_dl)))
##              variable relative_importance scaled_importance percentage
## 1   Magnitude.Type.MB           1.0000000         1.0000000 0.06385238
## 2  Magnitude.Type.MWB           0.7002752         0.7002752 0.04471424
## 3   Magnitude.Type.ML           0.6861980         0.6861980 0.04381538
## 4   Magnitude.Type.MW           0.6787863         0.6787863 0.04334212
## 5 Horizontal.Distance           0.6658800         0.6658800 0.04251803
## 6   Magnitude.Type.MD           0.6615564         0.6615564 0.04224195
h2o.performance(model_dl)
## H2ORegressionMetrics: deeplearning
## ** Reported on training data. **
## ** Metrics reported on temporary training frame with 10053 samples **
## 
## MSE:  0.1499645
## RMSE:  0.3872525
## MAE:  0.2743254
## RMSLE:  0.05330261
## Mean Residual Deviance :  0.1499645

The random forest model was performing better… we can try to tune the neural network model. The random forest model is maybe better fitting dataset.

prediction on the validation dataset

pred <- h2o.predict(model_dl, newdata = test_split)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
pred
##    predict
## 1 6.093479
## 2 6.001661
## 3 5.967594
## 4 5.957917
## 5 5.956545
## 6 5.959258
## 
## [4593 rows x 1 column]

extra parameters

We can use the R function “do.call()” in order to test different parameters and analyse their consequences on the model performance. “do.call()” allows us different combinations of arguments.

# DEEP LEARNING TOPOLOGY TEST FUNCTION
run_DL_test <- function(extra_params) {
  model_test <- do.call(h2o.deeplearning, modifyList(list(x = predictors,
                                                          y = response,
                                                          training_frame = train_split,
                                                          validation_frame = test_split,
                                                          seed = 1234),
                                                          extra_params))
  
  idx <- paste(names(extra_params), extra_params, sep = "=", collapse=" ")
  
  # Model Metrics
  sampleshist <- model_test@model$scoring_history$samples
  samples <- sampleshist[length(sampleshist)]
  time <- model_test@model$run_time/1000
  
  training_MSE <- (tail(model_test@model$scoring_history$training_rmse, n=1))^2
  test_MSE <- (tail(model_test@model$scoring_history$validation_rmse, n=1))^2
  
  # Group Results
  c( idx, samples, sprintf("%.3f", time), sprintf("%.3f", training_MSE), 
     sprintf("%.3f", test_MSE)
    )
}
#https://www.kaggle.com/fraserpal/kaggle-titanic-disaster-data-set-and-h2o-framework/code
# EXPORT METRICS TO DF and CSV FILE FUNCTION
build_nn_topology_test_results <- function(results, file) {
  table <- matrix(unlist(results), ncol = 5, byrow = TRUE)
  
  colnames(table) <- c("idx", "Sample", "time", "Training_MSE", "Validation_MSE")
  #write.csv(table, file.path(workdir,file), col.names = T, row.names=F, quote=T, sep=",")
  return(as.data.frame(table))

}
# DEEP LEARNING TOPOLOGY TEST EXECUTION
# Parameters
EPOCHS = 1

NN_topology_test <- list(
list(hidden=c(32),             epochs=EPOCHS),
list(hidden=c(128),            epochs=EPOCHS),
list(hidden=c(256),            epochs=EPOCHS),
list(hidden=c(512),            epochs=EPOCHS),
list(hidden=c(32,32),          epochs=EPOCHS),
list(hidden=c(128,128),        epochs=EPOCHS),
list(hidden=c(256,256),        epochs=EPOCHS),
list(hidden=c(512,512),        epochs=EPOCHS),
list(hidden=c(32,32,32),       epochs=EPOCHS),
list(hidden=c(128,128,128),    epochs=EPOCHS),
list(hidden=c(256,256,256),    epochs=EPOCHS),
list(hidden=c(512,512,512),    epochs=EPOCHS))

# Execution
nn_topology_results_df <- build_nn_topology_test_results(lapply(NN_topology_test, run_DL_test), "network_topology_test.csv")
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |======================================================================| 100%
nn_topology_results_df
##                                 idx Sample  time Training_MSE Validation_MSE
## 1                hidden=32 epochs=1  15405 0.146        0.155          0.145
## 2               hidden=128 epochs=1  15405 0.293        0.160          0.151
## 3               hidden=256 epochs=1  15405 0.383        0.171          0.163
## 4               hidden=512 epochs=1  15405 0.615        0.159          0.149
## 5         hidden=c(32, 32) epochs=1  15405 0.158        0.157          0.152
## 6       hidden=c(128, 128) epochs=1  15405 0.643        0.149          0.138
## 7       hidden=c(256, 256) epochs=1  15405 1.118        0.153          0.144
## 8       hidden=c(512, 512) epochs=1  15405 3.143        0.153          0.144
## 9     hidden=c(32, 32, 32) epochs=1  15405 0.192        0.155          0.146
## 10 hidden=c(128, 128, 128) epochs=1  15405 1.009        0.174          0.158
## 11 hidden=c(256, 256, 256) epochs=1  15405 2.676        0.152          0.142
## 12 hidden=c(512, 512, 512) epochs=1  15405 6.701        0.161          0.152

After training on the training set, the best option is 2 hidden layers with 128 neurons. We get here the best training and validation MSE. In this case there are 14126 samples, neurons are training relatively fast ( 0,693).

library(ggplot2)

p<-ggplot(data=nn_topology_results_df, aes(x= idx , y= Training_MSE)) +
  geom_bar(stat="identity", width = .5, position = "dodge") +
    theme_minimal() + 
    theme(axis.text.x =  element_text(angle = 45, vjust = 1, size = 12, hjust = 1)) +
        labs(x = NULL, y = 'MSE', fill = NULL)
p

We can see see here the best hidden layer combination which minimize the MSE.

Early Stopping

Now we run another, smaller network, and we let it stop automatically once the MSE converges (specifically, if the moving average of length 2 does not improve by at least 1% for 2 consecutive scoring events). We also sample the validation set to 10,000 rows for faster scoring.

m2 <- h2o.deeplearning(
  model_id="dl_model_faster", 
  training_frame=train_split, 
  validation_frame=valid_split,
  x=predictors,
  y=response,
  hidden=c(32,32,32),                 
  epochs=10000,                      
  score_validation_samples=1000,      
  stopping_rounds=2,
  stopping_metric="MSE", #
  stopping_tolerance=0.15, 
  seed = 1234
)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |======================================================================| 100%
summary(m2)
## Model Details:
## ==============
## 
## H2ORegressionModel: deeplearning
## Model Key:  dl_model_faster 
## Status of Neuron Layers: predicting Magnitude, regression, gaussian distribution, Quadratic loss, 3.105 weights/biases, 44,2 KB, 5.100.565 training samples, mini-batch size 1
##   layer units      type dropout       l1       l2 mean_rate rate_rms momentum
## 1     1    29     Input  0.00 %       NA       NA        NA       NA       NA
## 2     2    32 Rectifier  0.00 % 0.000000 0.000000  0.095647 0.274463 0.000000
## 3     3    32 Rectifier  0.00 % 0.000000 0.000000  0.000829 0.000821 0.000000
## 4     4    32 Rectifier  0.00 % 0.000000 0.000000  0.003360 0.009185 0.000000
## 5     5     1    Linear      NA 0.000000 0.000000  0.000358 0.000196 0.000000
##   mean_weight weight_rms mean_bias bias_rms
## 1          NA         NA        NA       NA
## 2   -0.020439   0.358771  0.383155 0.248984
## 3   -0.049868   0.305645  0.942008 0.194314
## 4   -0.089162   0.402684  0.901474 0.598295
## 5   -0.033107   0.447176  0.655452 0.000000
## 
## H2ORegressionMetrics: deeplearning
## ** Reported on training data. **
## ** Metrics reported on temporary training frame with 10053 samples **
## 
## MSE:  0.1315968
## RMSE:  0.3627627
## MAE:  0.2513932
## RMSLE:  0.04997649
## Mean Residual Deviance :  0.1315968
## 
## 
## H2ORegressionMetrics: deeplearning
## ** Reported on validation data. **
## ** Metrics reported on temporary validation frame with 1010 samples **
## 
## MSE:  0.141641
## RMSE:  0.3763522
## MAE:  0.2655284
## RMSLE:  0.05219126
## Mean Residual Deviance :  0.141641
## 
## 
## 
## 
## Scoring History: 
##             timestamp   duration training_speed    epochs iterations
## 1 2021-09-01 20:28:53  0.000 sec             NA   0.00000          0
## 2 2021-09-01 20:28:53  0.528 sec 200861 obs/sec   7.11962          1
## 3 2021-09-01 20:28:58  5.800 sec 295398 obs/sec 120.79912         17
## 4 2021-09-01 20:29:04 11.070 sec 308894 obs/sec 241.53395         34
## 5 2021-09-01 20:29:09 16.163 sec 317061 obs/sec 362.30750         51
## 6 2021-09-01 20:29:09 16.184 sec 316943 obs/sec 362.30750         51
##          samples training_rmse training_deviance training_mae training_r2
## 1       0.000000            NA                NA           NA          NA
## 2  100230.000000       0.40134           0.16107      0.29303     0.12170
## 3 1700610.000000       0.36276           0.13160      0.25139     0.28243
## 4 3400315.000000       0.36459           0.13293      0.25778     0.27518
## 5 5100565.000000       0.35181           0.12377      0.25205     0.32511
## 6 5100565.000000       0.36276           0.13160      0.25139     0.28243
##   validation_rmse validation_deviance validation_mae validation_r2
## 1              NA                  NA             NA            NA
## 2         0.38912             0.15141        0.28552       0.09431
## 3         0.37635             0.14164        0.26553       0.15277
## 4         0.39162             0.15337        0.28018       0.08263
## 5         0.38615             0.14911        0.28247       0.10810
## 6         0.37635             0.14164        0.26553       0.15277
## 
## Variable Importances: (Extract with `h2o.varimp`) 
## =================================================
## 
## Variable Importances: 
##                     variable relative_importance scaled_importance percentage
## 1          Magnitude.Type.MB            1.000000          1.000000   0.076872
## 2                       Date            0.780753          0.780753   0.060018
## 3 Magnitude.Seismic.Stations            0.755005          0.755005   0.058039
## 4                      Depth            0.621378          0.621378   0.047767
## 5           Horizontal.Error            0.566507          0.566507   0.043549
## 
## ---
##                      variable relative_importance scaled_importance percentage
## 24             Type.Explosion            0.348744          0.348744   0.026809
## 25          Magnitude.Type.MH            0.322583          0.322583   0.024798
## 26            Magnitude.Error            0.299522          0.299522   0.023025
## 27            Type.Rock Burst            0.275314          0.275314   0.021164
## 28 Magnitude.Type.missing(NA)            0.000000          0.000000   0.000000
## 29           Type.missing(NA)            0.000000          0.000000   0.000000
plot(m2)

On the validation set, the MSE is 0.139 (becausthe the neural network is stopping after MSE 0.15). We get an RMSE of 0.3731 on the validation dataset. The model is working well, because we get a better performance on the validation data set than on the training dataset.

We are not applying any regularization, no drop out. the weight attached at each layer is around 0.20. It’s nearly the same weight for each layer. The bias is decreasing after the training of each layer which make sens.

The RMSE on the training data set is decreasing to 0.35 and then strongly increasing to 0.386 for the output layer…

The validation rmse is evolving conversely, the validation rmse is increasing and then decreasing by the output layer.

The most important variables are: Magnitude Type MB, Magnitude Type MWB and Horizontal Error.

We can try to tune the hyperparameters in order to get a better performance.

Tuning

hyper parameter tuning with grid search

We can use a grid with 3 hidden layers of 32 neurons each or 2 hidden layers of 64 neurons each. The dropout ratio is 0 or 0.05 and the learning rate is 0.01 or 0.02.

hyper_params <- list(
  hidden=list(c(32,32,32),c(64,64)),
  input_dropout_ratio=c(0,0.05),
  rate=c(0.01,0.02)
)
hyper_params
## $hidden
## $hidden[[1]]
## [1] 32 32 32
## 
## $hidden[[2]]
## [1] 64 64
## 
## 
## $input_dropout_ratio
## [1] 0.00 0.05
## 
## $rate
## [1] 0.01 0.02
grid <- h2o.grid(
  algorithm="deeplearning",
  grid_id="dl_grid", 
  training_frame=train_split,
  validation_frame=valid_split, 
  x=predictors, 
  y=response,
  epochs=10,
  stopping_metric="MSE",
  stopping_tolerance=0.15,        
  stopping_rounds=2,
  score_validation_samples=10000, 
  score_duty_cycle=0.025,         
  adaptive_rate=F,                
  momentum_start=0.5,            
  momentum_stable=0.9, 
  momentum_ramp=1e7, 
  l1=1e-5,
  l2=1e-5,
  activation=c("Rectifier"),
  max_w2=10,                      
  hyper_params=hyper_params, 
  seed = 1234
)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## Warning in h2o.getGrid(grid_id = grid_id): Some models were not built due to a
## failure, for more details run `summary(grid_object, show_stack_traces = TRUE)`
grid <- h2o.getGrid("dl_grid",sort_by="MSE",decreasing=FALSE)
## Warning in h2o.getGrid("dl_grid", sort_by = "MSE", decreasing = FALSE):
## Some models were not built due to a failure, for more details run
## `summary(grid_object, show_stack_traces = TRUE)`
grid
## H2O Grid Details
## ================
## 
## Grid ID: dl_grid 
## Used hyper parameters: 
##   -  hidden 
##   -  input_dropout_ratio 
##   -  rate 
## Number of models: 7 
## Number of failed models: 1 
## 
## Hyper-Parameter Search Summary: ordered by increasing MSE
##         hidden input_dropout_ratio rate       model_ids                 mse
## 1 [32, 32, 32]                0.05 0.01 dl_grid_model_3  0.1506389355285432
## 2 [32, 32, 32]                 0.0 0.01 dl_grid_model_1 0.15081015982705367
## 3     [64, 64]                0.05 0.01 dl_grid_model_4 0.15498711075577287
## 4     [64, 64]                 0.0 0.01 dl_grid_model_2  0.1556678698084069
## 5 [32, 32, 32]                0.05 0.02 dl_grid_model_7 0.16020831366916713
## 6     [64, 64]                0.05 0.02 dl_grid_model_8   0.163860164679336
## 7     [64, 64]                 0.0 0.02 dl_grid_model_6 0.16614763762292067
## Failed models
## -------------
##        hidden input_dropout_ratio rate status_failed
##  [32, 32, 32]                 0.0 0.02          FAIL
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  msgs_failed
##  "DistributedException from localhost/127.0.0.1:54321: '\n\nTrying to predict with an unstable model.\nJob was aborted due to observed numerical instability (exponential growth).\nEither the weights or the bias values are unreasonably large or lead to large activation values.\nTry a different initial distribution, a bounded activation function (Tanh), adding regularization\n(via max_w2, l1, l2, dropout) or learning rate (either enable adaptive_rate or use a smaller learning rate or faster annealing).\n\nFor more information visit:\n  http://jira.h2o.ai/browse/TN-4'"
grid@summary_table[1,]
## Hyper-Parameter Search Summary: ordered by increasing MSE
##         hidden input_dropout_ratio rate       model_ids                mse
## 1 [32, 32, 32]                0.05 0.01 dl_grid_model_3 0.1506389355285432
best_model <- h2o.getModel(grid@model_ids[[1]])
best_model
## Model Details:
## ==============
## 
## H2ORegressionModel: deeplearning
## Model ID:  dl_grid_model_3 
## Status of Neuron Layers: predicting Magnitude, regression, gaussian distribution, Quadratic loss, 3.105 weights/biases, 31,7 KB, 140.780 training samples, mini-batch size 1
##   layer units      type dropout       l1       l2 mean_rate rate_rms momentum
## 1     1    29     Input  5.00 %       NA       NA        NA       NA       NA
## 2     2    32 Rectifier  0.00 % 0.000010 0.000010  0.008766 0.000000 0.505631
## 3     3    32 Rectifier  0.00 % 0.000010 0.000010  0.008766 0.000000 0.505631
## 4     4    32 Rectifier  0.00 % 0.000010 0.000010  0.008766 0.000000 0.505631
## 5     5     1    Linear      NA 0.000010 0.000010  0.008766 0.000000 0.505631
##   mean_weight weight_rms mean_bias bias_rms
## 1          NA         NA        NA       NA
## 2   -0.034896   0.276289  0.198727 0.290874
## 3   -0.100428   0.210535  0.786129 0.111492
## 4   -0.095609   0.193557  0.647517 0.119430
## 5   -0.019188   0.136456  0.379797 0.000000
## 
## 
## H2ORegressionMetrics: deeplearning
## ** Reported on training data. **
## ** Metrics reported on temporary training frame with 10053 samples **
## 
## MSE:  0.1492478
## RMSE:  0.386326
## MAE:  0.2904757
## RMSLE:  0.05345216
## Mean Residual Deviance :  0.1492478
## 
## 
## H2ORegressionMetrics: deeplearning
## ** Reported on validation data. **
## ** Metrics reported on full validation frame **
## 
## MSE:  0.1506389
## RMSE:  0.3881223
## MAE:  0.293015
## RMSLE:  0.05370129
## Mean Residual Deviance :  0.1506389

Tuning with this grid, we get a performance of 0.39 on the validation set, which is not great. The model is overfitting… the RMSE on the training is smaller with (0.38). The model is learning too well the training set.

The best model is the one with 2 hidden layers of 64 neurons each, the best learning is 0.01.

Random search

We can try the random search, it will naybe give us better results. We are using here“max models” , parameters are drawn randomly. It can be effective because we have more than 4 parameters to tune.

hyper_params <- list(
  activation=c("Rectifier","Tanh","Maxout","RectifierWithDropout","TanhWithDropout","MaxoutWithDropout"),
  hidden=list(c(20,20),c(50,50),c(30,30,30),c(25,25,25,25)),
  input_dropout_ratio=c(0,0.05),
  l1=seq(0,1e-4,1e-6),
  l2=seq(0,1e-4,1e-6)
)
hyper_params
## $activation
## [1] "Rectifier"            "Tanh"                 "Maxout"              
## [4] "RectifierWithDropout" "TanhWithDropout"      "MaxoutWithDropout"   
## 
## $hidden
## $hidden[[1]]
## [1] 20 20
## 
## $hidden[[2]]
## [1] 50 50
## 
## $hidden[[3]]
## [1] 30 30 30
## 
## $hidden[[4]]
## [1] 25 25 25 25
## 
## 
## $input_dropout_ratio
## [1] 0.00 0.05
## 
## $l1
##   [1] 0.0e+00 1.0e-06 2.0e-06 3.0e-06 4.0e-06 5.0e-06 6.0e-06 7.0e-06 8.0e-06
##  [10] 9.0e-06 1.0e-05 1.1e-05 1.2e-05 1.3e-05 1.4e-05 1.5e-05 1.6e-05 1.7e-05
##  [19] 1.8e-05 1.9e-05 2.0e-05 2.1e-05 2.2e-05 2.3e-05 2.4e-05 2.5e-05 2.6e-05
##  [28] 2.7e-05 2.8e-05 2.9e-05 3.0e-05 3.1e-05 3.2e-05 3.3e-05 3.4e-05 3.5e-05
##  [37] 3.6e-05 3.7e-05 3.8e-05 3.9e-05 4.0e-05 4.1e-05 4.2e-05 4.3e-05 4.4e-05
##  [46] 4.5e-05 4.6e-05 4.7e-05 4.8e-05 4.9e-05 5.0e-05 5.1e-05 5.2e-05 5.3e-05
##  [55] 5.4e-05 5.5e-05 5.6e-05 5.7e-05 5.8e-05 5.9e-05 6.0e-05 6.1e-05 6.2e-05
##  [64] 6.3e-05 6.4e-05 6.5e-05 6.6e-05 6.7e-05 6.8e-05 6.9e-05 7.0e-05 7.1e-05
##  [73] 7.2e-05 7.3e-05 7.4e-05 7.5e-05 7.6e-05 7.7e-05 7.8e-05 7.9e-05 8.0e-05
##  [82] 8.1e-05 8.2e-05 8.3e-05 8.4e-05 8.5e-05 8.6e-05 8.7e-05 8.8e-05 8.9e-05
##  [91] 9.0e-05 9.1e-05 9.2e-05 9.3e-05 9.4e-05 9.5e-05 9.6e-05 9.7e-05 9.8e-05
## [100] 9.9e-05 1.0e-04
## 
## $l2
##   [1] 0.0e+00 1.0e-06 2.0e-06 3.0e-06 4.0e-06 5.0e-06 6.0e-06 7.0e-06 8.0e-06
##  [10] 9.0e-06 1.0e-05 1.1e-05 1.2e-05 1.3e-05 1.4e-05 1.5e-05 1.6e-05 1.7e-05
##  [19] 1.8e-05 1.9e-05 2.0e-05 2.1e-05 2.2e-05 2.3e-05 2.4e-05 2.5e-05 2.6e-05
##  [28] 2.7e-05 2.8e-05 2.9e-05 3.0e-05 3.1e-05 3.2e-05 3.3e-05 3.4e-05 3.5e-05
##  [37] 3.6e-05 3.7e-05 3.8e-05 3.9e-05 4.0e-05 4.1e-05 4.2e-05 4.3e-05 4.4e-05
##  [46] 4.5e-05 4.6e-05 4.7e-05 4.8e-05 4.9e-05 5.0e-05 5.1e-05 5.2e-05 5.3e-05
##  [55] 5.4e-05 5.5e-05 5.6e-05 5.7e-05 5.8e-05 5.9e-05 6.0e-05 6.1e-05 6.2e-05
##  [64] 6.3e-05 6.4e-05 6.5e-05 6.6e-05 6.7e-05 6.8e-05 6.9e-05 7.0e-05 7.1e-05
##  [73] 7.2e-05 7.3e-05 7.4e-05 7.5e-05 7.6e-05 7.7e-05 7.8e-05 7.9e-05 8.0e-05
##  [82] 8.1e-05 8.2e-05 8.3e-05 8.4e-05 8.5e-05 8.6e-05 8.7e-05 8.8e-05 8.9e-05
##  [91] 9.0e-05 9.1e-05 9.2e-05 9.3e-05 9.4e-05 9.5e-05 9.6e-05 9.7e-05 9.8e-05
## [100] 9.9e-05 1.0e-04
search_criteria = list(strategy = "RandomDiscrete", max_runtime_secs = 360, max_models = 100, seed=1234567, stopping_rounds=5, stopping_tolerance=1e-2)
dl_random_grid <- h2o.grid(
  algorithm="deeplearning",
  grid_id = "dl_grid_random",
  training_frame=train_split,
  validation_frame=valid_split, 
  x=predictors, 
  y=response,
  epochs=1,
  stopping_metric="MSE",
  stopping_tolerance=0.15,       
  stopping_rounds=2,
  score_validation_samples=1000, 
  score_duty_cycle=0.025,         
  max_w2=10,                      
  hyper_params = hyper_params,
  search_criteria = search_criteria
)                                
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |======================================================================| 100%
grid <- h2o.getGrid("dl_grid_random",sort_by="MSE",decreasing=FALSE)
grid
## H2O Grid Details
## ================
## 
## Grid ID: dl_grid_random 
## Used hyper parameters: 
##   -  activation 
##   -  hidden 
##   -  input_dropout_ratio 
##   -  l1 
##   -  l2 
## Number of models: 100 
## Number of failed models: 0 
## 
## Hyper-Parameter Search Summary: ordered by increasing MSE
##          activation       hidden input_dropout_ratio     l1     l2
## 1         Rectifier [30, 30, 30]                 0.0 7.0E-6 3.9E-5
## 2         Rectifier     [50, 50]                 0.0 2.0E-5 4.9E-5
## 3         Rectifier     [20, 20]                 0.0 3.9E-5 3.2E-5
## 4 MaxoutWithDropout     [20, 20]                 0.0 1.2E-5 1.0E-4
## 5 MaxoutWithDropout     [50, 50]                 0.0 5.0E-6 8.5E-5
##                 model_ids                 mse
## 1 dl_grid_random_model_99 0.13049102834724308
## 2 dl_grid_random_model_66 0.13202767782314578
## 3 dl_grid_random_model_55 0.13504235537217396
## 4 dl_grid_random_model_13  0.1357002069826429
## 5 dl_grid_random_model_29 0.13852280040801768
## 
## ---
##               activation           hidden input_dropout_ratio     l1     l2
## 95  RectifierWithDropout [25, 25, 25, 25]                 0.0 1.4E-5 6.2E-5
## 96  RectifierWithDropout         [20, 20]                 0.0 3.3E-5 9.9E-5
## 97  RectifierWithDropout     [30, 30, 30]                0.05 5.0E-6 1.2E-5
## 98                Maxout [25, 25, 25, 25]                0.05 7.3E-5 7.4E-5
## 99                Maxout [25, 25, 25, 25]                0.05 1.9E-5 8.4E-5
## 100 RectifierWithDropout         [20, 20]                0.05 8.7E-5 4.5E-5
##                   model_ids                 mse
## 95  dl_grid_random_model_47 0.18551703624874358
## 96  dl_grid_random_model_71 0.18839161974958143
## 97   dl_grid_random_model_3  0.1884772678953924
## 98  dl_grid_random_model_58 0.18862631803132252
## 99   dl_grid_random_model_5  0.1930043580881467
## 100 dl_grid_random_model_54  0.2057656910544259
grid@summary_table[1,]
## Hyper-Parameter Search Summary: ordered by increasing MSE
##   activation       hidden input_dropout_ratio     l1     l2
## 1  Rectifier [30, 30, 30]                 0.0 7.0E-6 3.9E-5
##                 model_ids                 mse
## 1 dl_grid_random_model_99 0.13049102834724308
best_model <- h2o.getModel(grid@model_ids[[1]]) ## model with lowest logloss
best_model
## Model Details:
## ==============
## 
## H2ORegressionModel: deeplearning
## Model ID:  dl_grid_random_model_99 
## Status of Neuron Layers: predicting Magnitude, regression, gaussian distribution, Quadratic loss, 2.791 weights/biases, 40,5 KB, 14.137 training samples, mini-batch size 1
##   layer units      type dropout       l1       l2 mean_rate rate_rms momentum
## 1     1    29     Input  0.00 %       NA       NA        NA       NA       NA
## 2     2    30 Rectifier  0.00 % 0.000007 0.000039  0.110032 0.307156 0.000000
## 3     3    30 Rectifier  0.00 % 0.000007 0.000039  0.001538 0.000960 0.000000
## 4     4    30 Rectifier  0.00 % 0.000007 0.000039  0.007235 0.025226 0.000000
## 5     5     1    Linear      NA 0.000007 0.000039  0.000471 0.001285 0.000000
##   mean_weight weight_rms mean_bias bias_rms
## 1          NA         NA        NA       NA
## 2    0.001962   0.190603  0.481094 0.042222
## 3   -0.010725   0.188213  0.974532 0.029551
## 4   -0.002069   0.185790  0.988172 0.015161
## 5    0.029567   0.269485 -0.003963 0.000000
## 
## 
## H2ORegressionMetrics: deeplearning
## ** Reported on training data. **
## ** Metrics reported on temporary training frame with 10044 samples **
## 
## MSE:  0.1515254
## RMSE:  0.3892627
## MAE:  0.2957811
## RMSLE:  0.05402443
## Mean Residual Deviance :  0.1515254
## 
## 
## H2ORegressionMetrics: deeplearning
## ** Reported on validation data. **
## ** Metrics reported on temporary validation frame with 1030 samples **
## 
## MSE:  0.130491
## RMSE:  0.3612354
## MAE:  0.2794253
## RMSLE:  0.05061631
## Mean Residual Deviance :  0.130491

We are getting here better results with the random search, which make sens, because we are trying more combinations defined in a specific space . However the random search takes more time complete and is more complicated than the grid search.

The best model is the one with 3 layers and 30 neurons each, no drop out and a bit l1 and l2 regularization.

Neural Networks are not so easy understandable ( in comparison to the tree model). Neural Networks are a kind of black box. We get a result, but we don’t know exactly how the decision has been made.

The random forest model is still the best on in terms of performance and parcimony.

Is it possible to predict the magnitude earthquake?

Like USGS is explaining (here: https://www.usgs.gov/faqs/can-you-predict-earthquakes?qt-news_science_products=0#qt-news_science_products ), it’s not possible to predict earthquakes and their magnitude. A major earthquake has never been predicted by a scientist. However, earthquakes analysis can help developping building design in order to limit earthquakes damages and save lifes.

Sources:

https://www.usgs.gov/faqs/can-you-predict-earthquakes?qt-news_science_products=0#qt-news_science_products https://www.kaggle.com/fraserpal/kaggle-titanic-disaster-data-set-and-h2o-framework/code

Gedeon: https://arxiv.org/pdf/1805.04755.pdf https://docs.h2o.ai/h2o-tutorials/latest-stable/tutorials/deeplearning/index.html

deep learning: https://www.kdnuggets.com/2018/01/deep-learning-h2o-using-r.html https://www.dtreg.com/methodology/view/decision-trees-compared-to-regression-and-neural-networks

ANOVA: https://slidetodoc.com/anova-analysis-of-variance-the-titanic-disaster-were/

VISUALISATIONS: https://jtr13.github.io/cc21/d-plot-in-r.html

SEISMS: https://en.wikipedia.org/wiki/Seismic_magnitude_scales https://www.usgs.gov/faqs/what-does-it-mean-earthquake-occurred-a-depth-0-km-how-can-earthquake-have-a-negative-depth?qt-news_science_products=0#qt-news_science_products https://www.britannica.com/science/earthquake-geology/Earthquake-magnitude http://www.earthquakes.bgs.ac.uk/education/faqs/faq15.html http://www.isc.ac.uk/docs/papers/download/2002p01/